61 research outputs found

    Active Inverse Reward Design

    Full text link
    Designers of AI agents often iterate on the reward function in a trial-and-error process until they get the desired behavior, but this only guarantees good behavior in the training environment. We propose structuring this process as a series of queries asking the user to compare between different reward functions. Thus we can actively select queries for maximum informativeness about the true reward. In contrast to approaches asking the designer for optimal behavior, this allows us to gather additional information by eliciting preferences between suboptimal behaviors. After each query, we need to update the posterior over the true reward function from observing the proxy reward function chosen by the designer. The recently proposed Inverse Reward Design (IRD) enables this. Our approach substantially outperforms IRD in test environments. In particular, it can query the designer about interpretable, linear reward functions and still infer non-linear ones

    Reducing Exploitability with Population Based Training

    Full text link
    Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games. Yet prior work has found that policies that are highly capable against regular opponents can fail catastrophically against adversarial policies: an opponent trained explicitly against the victim. Prior defenses using adversarial training were able to make the victim robust to a specific adversary, but the victim remained vulnerable to new ones. We conjecture this limitation was due to insufficient diversity of adversaries seen during training. We propose a defense using population based training to pit the victim against a diverse set of opponents. We evaluate this defense's robustness against new adversaries in two low-dimensional environments. Our defense increases robustness against adversaries, as measured by number of attacker training timesteps to exploit the victim. Furthermore, we show that robustness is correlated with the size of the opponent population.Comment: Presented at New Frontiers in Adversarial Machine Learning Workshop, ICML 202

    imitation: Clean Imitation Learning Implementations

    Full text link
    imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse reinforcement learning (IRL) algorithms, three imitation learning algorithms and a preference comparison algorithm. The implementations have been benchmarked against previous results, and automated tests cover 98% of the code. Moreover, the algorithms are implemented in a modular fashion, making it simple to develop novel algorithms in the framework. Our source code, including documentation and examples, is available at https://github.com/HumanCompatibleAI/imitatio

    Adversarial Policies Beat Superhuman Go AIs

    Full text link
    We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human experts can implement it without algorithmic assistance to consistently beat superhuman AIs. The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available https://goattack.far.ai/.Comment: Accepted to ICML 2023, see paper for changelo

    Surgical Treatment of Renal Cell Cancer Liver Metastases: A Population-Based Study

    Get PDF
    Background: To evaluate outcomes of surgical treatment in patients with hepatic metastases from renal-cell carcinoma in the Netherlands, and to identify prognostic factors for survival after resection. Renal-cell carcinoma has an incidence of 2,000 new patients in the Netherlands each year (12.5/100,000 inhabitants). According to literature, half of these patients ultimately develop distant metastases with 20% involvement of the liver. Resection of renal-cell carcinoma liver metastases (RCCLM) is performed in only a minority of patients. Hence, little is known about outcome of resectable RCCLM. Methods: Patients were retrieved from local databases of theNetherlands Task Force for Liver Surgery (14 centers) and from the Dutch collective pathology database. Survival and prognostic factors were determined by Kaplan-Meier analysis and log rank test. Results: Thirty-three patients were identified who underwent resection (n = 29) or local ablation (n = 4) of RCCLM in the Netherlands between 1990 and 2008. These patients comprise 0.5% to 1% of the total population of patients diagnosed with RCCLM in that period. There was no operative mortality. The overall survival at 1, 3, and 5 years was 79, 47, and 43%, respectively. Metachronous metastases (n = 23, P = 0.03) and radical resection (n = 19, P < 0.001) were statistically significant prognosticators of ov
    corecore